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Abstract. We present a randomized approximation algorithm for counting con- 
tingency tables, m x n non-negative integer matrices with given row sums R = 
(n, . . . , r m ) and column sums C = [c\, ... , c n ). We define smooth margins (R, C) 
in terms of the typical table and prove that for such margins the algorithm has quasi- 
polynomial N°( inNS> complexity, where N = r% + • • • + T m = c\ + ■ ■ ■ + c n . Various 
classes of margins are smooth, e.g., when m = 0(n), n = 0(m) and the ratios be- 
tween the largest and the smallest row sums as well as between the largest and the 
smallest column sums are strictly smaller than the golden ratio (1 + y/E)/2 w 1.618. 
The algorithm builds on Monte Carlo integration and sampling algorithms for log- 
concave densities, the matrix scaling algorithm, the permanent approximation algo- 
rithm, and an integral representation for the number of contingency tables. 



1. Introduction 

Let R = (ri, . . . , r m ) and C = (ci, . . . , c n ) be positive integer vectors such that 

m n 
i=l j=l 

A contingency table with margins (R, C) is an m x n non-negative integer matrix 
D = (dij) with row sums R and column sums C: 



d^ = ri for % = 1, . . . , m 



3=1 



^ dij = Cj for j = 1,... , n. 



i=l 



Key words and phrases. Contingency tables, randomized approximation algorithm, matrix 
scaling algorithm, permanent approximation algorithm. 
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Let C) denote the number of these contingency tables. 

There is interest in the study of #(-R, C), due to connections to statistics, com- 
binatorics and representation theory, see, e.g., [Go 76], [DE85], [DG95], [D+97], 
[Mo02], [CD03], [L+04], [B+04], [C+05] and the references therein. However, since 
enumerating C) is a #P-complete problem even for m = 2 [D+97], one does 
not expect to find polynomial-time algorithms (nor formulas) computing #(R,C) 
exactly. As a result, attention has turned to the open problem of efficiently esti- 
mating C). 

We present a randomized algorithm for approximating C) within a pre- 

scribed relative error. Based on earlier numerical studies [Yo07] [B+07], we conjec- 
ture that its complexity is polynomial in N . We provide further evidence for this 
hypothesis: we introduce "smooth margins" (R, C) where the entries of the typical 
table are not too large, and among {n, . . . , r m , c±, . . . , c n } there are no "outliers". 
Our main result is that smoothness implies a quasi-polynomial N°^ ogN ^ com- 
plexity bound on the algorithm. More precisely, we approximate #(R,C) within 
relative error e > using time in the unit cost model, provided 

e > 2" m + 2~ n . 1 

The class of smooth margins captures a number of interesting subclasses. In 
particular, this work applies to the case of magic squares (where m = n and = 
Cj = t for all z, j), extending [B+07]. More generally, smoothness includes the case 
when the ratios m/n and n/m are bounded by a constant fixed in advance while the 
ratios between the largest and the smallest row sums as well as between the largest 
and the smallest column sums are smaller than the golden ratio (l + y/b) /2 m 
1.618. These and others examples are explicated in Section 3. See Section 1.4 for 
comparisons to the literature. 

(1.1) An outline of the algorithm. Our algorithm builds on the technique 
of rapidly mixing Markov chains and, in particular, on efficient integration and 
sampling from log-concave densities, as developed in [AK91], [F+94], [FK99], [LV06] 
(see also [Ve05] for a survey), the permanent approximation algorithm [J+04], the 
strongly polynomial time algorithm for matrix scaling [L+00], and the integral 
representation of #(-R, C) from [Ba08]. 

Let A = A mXn C M mn be the open {ran — l)-dimensional simplex of all m x n 
positive matrices X = (xij) such that 

ij 

Let dX be Lebesgue measure on A normalized to the probability measure. An 
integral representation for C) was found in [Ba08]: 

(1.1.1) #(fl,c)= / f(X)dX, 

J A 

1 If an exponentially small relative error e = O (2~ m + 2~™) is desired, one has an exact 
dynamic programming algorithm with j\[°( m + n ) = (l/e)°( lnAr ) quasi-polynomial complexity. 
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where / : A 



R + is a certain continuous function that factors as 



(1.1.2) f = JHf>, 

where 

p(X) > 1 for all X G A 

is a function that "does not vary much", and : A — > R + is continuous and 
log-concave, that is, 

<f)(aX + pY) > (f) a (X)(t) p (Y) for all X, Y G A and 

for all a, f3 > such that a + (3 = 1. 

Full details about / and its factorization are reviewed in Section 2. 

For any X G A, the values of p(X) and <j>(X) are computable in time polynomial 
in N. Given e > 0, the value of p(X) can be computed, within relative error e in 
time polynomial in 1/e and N, by a randomized algorithm of [J+04]. The value of 
4>(X) can be computed, within relative error e in time polynomial in ln(l/e) and 
N, by a deterministic algorithm of [L+00]. 

The central idea of this paper is to define smooth margins (R : C) so that matrices 
X G A with large values of p(X) do not contribute much to the integral (1.1.1). 
Our main results, precisely stated in Section 3, are that for smooth margins, there is 
a threshold r = N for some constant 5 > (depending on the class of margins 
considered) such that if we define the truncation p : A — > R + by 



p(X) = 



r p(x) if p(x) < r 

\t if p{X) > r 



then 



(1.1.3) #(R,C)= [ p(X)(f)(X) dX w I p{X)(j){X) dX 

J A J A 



where "~" means "approximates to within an O (2~ n + 2~ m ) relative error" (in 
fact, rather than base 2, any constant M > 1, fixed in advance, can be used). We 
conjecture that one can choose the threshold r = N°^\ which would make the 
complexity of our algorithm polynomial in N. 

The first step (and a simplified version) of our algorithm computes the integral 

(1.1.4) / <j>{X) dX 

J A 

using any of the aformentioned randomized polynomial time algorithms for inte- 
grating log-concave densities; these results imply that this step has polynomial in N 
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complexity. By (1.1.3) it follows that for smooth (i?, C) the integral (1.1.4) approx- 
imates #(R, C) within a factor of N 0( - lnN K This simplified algorithm is suggested 
in [Ba08]; an implementation that utilizes a version of the hit-and-run algorithm of 
[LV06], together with numerical results is described in [Yo07] and [B+07]. 

Next, our algorithm estimates (1.1.3) within relative error e using the aformen- 
tioned randomized polynomial time algorithm for approximating the permanent of 
a matrix, and any of those for sampling from log-concave densities. Specifically, let 
v be the probability measure on A with the density proportional to 4>. Thus, 

Jj(X)(f>(X) dX = (Jjdv^j ^ jj(X) dX^j . 

The second factor is computed by the above first step, while the first factor is 
approximated by the sample mean 

k 

(1.1.5) I pdv^l^TpiXi), 

Ja fc i=i 

where X±, . . . , Xk G A are independent points sampled at random from measure v. 
Since 1 < p(X) < r, the Chebyshev inequality implies that to achieve relative error 
e with probability 2/3 it suffices to sample k = O (e _2 r 2 ) = e~ 2 N°( lnN ^ points in 
(1.1.5). 

The results of [AK91], [F+94], [FK99], and [LV06] imply that for any given e > 
one can sample independent points Xi, . . . ,Xk from a distribution v on A such 
that 

\v{S) -v{S)\ < e for any Borel set ScA. 

in time linear in k and polynomial in e _1 and N. Replacing v by v in (1.1.5) 
introduces an additional relative error of er = eN slnN , handled by choosing a 
smaller e = O (N- slnN ). 

(1.2) An optimization problem, typical tables and smooth margins. We 

will define smoothness of margins in terms of a certain convex optimization problem. 

Let V = V(R, C) be the transportation polytope of m x n no n- negative matrices 
X = (x^) with row sums R and column sums C. On the space W^ n ofmxn 
non-negative matrices define 

g(X) = ((xjj + 1) In (xjj + 1) - Xij Inxj^] for X = (x^) . 

ij 

The following optimization problem plays an important role in this paper: 
(1.2.1) Maximize g(X) subject to X G V. 

It is easy to check that g is strictly concave and hence attains its maximum on V 
at a unique matrix X* = (x*j), X* G V that we call the typical table. 
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An intuitive explanation for the appearance of this optimization problem, and 
justification for the nomenclature "typical" derives from work of [B07b] (relevant 
parts are replicated for convenience, in Section 4, see specifically Theorem 4.1). In 
short, X* determines the asymptotic behavior of #(R,C). 

The main requirement that we demand of smooth margins (R, C) to satisfy (see 
Section 3 for unsuppressed technicalities) is that the entries of the typical table are 
not too large, that is, entries x*j of the optimal solution X* = (x*j) satisfy 

N 

maxx*- = O(s) where s = 

ij 10 w mn 

is the average entry of the table. 

Viewing the typical table as interesting in its own right, one would like to under- 
stand how the typical table changes as the margins vary. The optimization problem 
being convex, X* can be computed efficiently by many existing algorithms, see, for 
example, [NN94]. However, in many instances of interest, the smoothness condition 
can be checked without actually needing to solve this problem. For example, if all 
the row sums are equal, the symmetry of the functional g under permutations of 
rows implies that 

x*, = — for all i, j. 

lJ m 

In general, the entries x*j stay small if the row sums and column sums Cj do not 
vary much. On the other hand, it is not hard to construct examples of margins 
(R, C) for n- vectors R and C such that n < r^, Cj < 3n and some of the entries x*j 
are large, in fact linear in n. Another one of our results (Theorem 3.5) gives upper 
and lower bounds for x*j in terms of (R, C). 

(1.4) Comparisons with the literature. Using the Markov Chain Monte Carlo 
approach, Dyer, Kannan, and Mount [D+97] count contingency tables when R and 
C are sufficiently large, that is, if = Q (n 2 mj and Cj — (m 2 n) for all Their 
randomized (sampling) algorithm approximates #(R, C) within any given relative 
error e > in time polynomial in e _1 , n, m, and ^^logr^ + X^ l°S c j (^ ne bft 
size of the margins). Subsequently, Morris [Mo02] obtained a similar result for the 
bounds Ti = O (n 3 / 2 mlnm) and Cj = O (m 3 / 2 nlnn). These results are based on 
fact that for large margins, the number of contingency tables is well-approximated 
by the volume of the transportation polytope V(R, C) (contingency tables being 
the integer points in this polytope). More generally, Kannan and Vempala [KV99] 
show that estimating the number integer points in a (i-dimensional polytope with 
m facets reduces to computing the volume of the polytope (a problem, for which 
efficient randomized algorithms exist, see [Ve05] for a survey) provided the polytope 
contains a ball of radius dy/logm. 

When the margins r^, Cj are very small, that is, bounded by a constant fixed 
in advance) relative to the sizes m and n of the matrix, Bekessy, Bekessy, and 
Komlos [B+72] obtain an efficient and precise asymptotic formula for C). 
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Their formula exploits the fact in this case, the majority of contingency tables have 
only entries 0, 1, and 2. Alternatively, in this case one can exactly compute C) 
in time polynomial inm + n via a dynamic programming algorithm. More recently, 
Greenhill and McKay [GM07] gave a computationally efficient asymptotic formula 
for a wider class of sparse margins (when r^Cj = o(A 2 / 3 )). 

Also using the dynamic programming approach, Cryan and Dyer [CD03] con- 
struct a randomized polynomial time approximation algorithm to compute #(R, C), 
provided the number of rows is fixed; see [C+06] for sharpening of the results. 

It seems that the most resilient case of computing #(i?, C) is where both m and 
n grow, and the margins are of moderate size, e.g., linear in the dimension. Re- 
cently, Canfield and McKay [CM07] found a precise asymptotic formula for #(-R, C) 
assuming that all row sums are equal and all column sums are equal. However, for 
general margins no such formula is known, even conjecturally. 

We remark that our notion of smooth margins includes all of the above regimes, 
except for that of large margins. 

Summarizing, although our complexity bounds do not improve on the algorithms 
in the above cases, our algorithm is provably computationally efficient (quasi- 
polynomial in N) for several new classes of margins, which include cases of growing 
dimensions m and n and moderate size margins R and C. 

2. The integral representation for 
the number of contingency tables 

We now give details of the integral representation (1.1.1). To do this, we express 
#(.R, C) as the expectation of the permanent of a random N x N matrix. Recall 
that the permanent of an N x N matrix A is defined by 

JV 

per A = Yl a i<r(i)i 

where Sn is the symmetric group of the permutations of the set {1, . . . , N}. The 
following result was proved in [Ba08]. 

(2.1) Theorem. For an m x n matrix X = (xij), let A(X) be the N x N block 
matrix A(X) whose the (i,j)-th block is the ri x Cj submatrix filled with x^, for 
i = l,... , m and j = 1, . . . , n. Then 

( 2iD p erA w = v n^Sl 

n!--T m ! Cl !---c n ! ^ 11 d l7 l ' 



where the sum is over all non-negative integer matrices D = (dij) with row sums 
R and column sums C . 
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Let M 1 V: ri be the open orthant of positive m x n matrices X. Then 



#(*, C) = ] , I per A(X) exp i - £ x y |> dX, 

^1- ^m ! Cl! C n l Jj&rnn I 1 



where dX is the Lebesgue measure on W£ n 

In the case that ri = a and Cj = b for all the expansion (2.1.1) was first 
observed by Bang and then used by Friedland [Fr79] in his proof of a weaker form 
of the van der Waerden conjecture; see Section 7.1 and references there. 

Since the function X i — > per A(X) is a homogeneous polynomial of degree N, 
one can express C) as an integral over the simplex. The following corollary 

was also obtained in [Ba08]. 

(2.2) Corollary. Let A = A mXn C K mn 6e #ie opera simplex of positive m x n 
matrices X = (xij) such that x ij = 1- Then 

#i R,C) = (JV . + "'"- 1)! - L, f / pe^(X) 

(mra-1)! ri!...r m !ci!---c n ! J AmXn 

where dX is the Lebesgue measure on A mXn normalized to the probability measure. 
Hence in the integral representation (1.1.1), we define the function / by 

(N + mn-l)\ 1 Afv . 

(mn — ly. r\\ . . . r m !ci! • • • c n ! 



(N + ran — 1)! 



e n 



di 



(mn — 1)1 - L - L diA 

where A(X) is the block matrix of Theorem 2.1 and the sum is over all contingency 
tables D with margins (R : C). 

(2.3) Matrix scaling and the factorization of /. To obtain the factorization 
(1.1.2), where <j> : A — > R + is a log-concave function and p : A — > R + is a 
function which "does not vary much", we employ the idea of matrix scaling, see 
[Si64], [M068], [KK96], Chapter 6 of [BR97], and [L+00]: Let X = (x y ) be a 
positive m x n matrix. Then there exists a unique m x n matrix Y with the row 
sums R = (ri, . . . , r m ), column sums C = (ci, . . . , c n ), and such that 

= yijKVj for all i, j 

and some positive Ai, . . . , A m , /ii, . . . , The numbers Aj and are unique up 
to a re-scaling Aj i — ► A^r, Uj i — > A t j" r_1 • Note that if we divide the entries in the 
(i,j)-ih block of the matrix A(X) of Theorem 2.1 by riCjXifij, we obtain a positive 
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doubly stochastic matrix B(X), that is, a positive matrix with all row and column 
sums equal to 1. Thus we have 



per A(X) = (n^rj ^II^c^ ) P^B(X). 
It is proved in [Ba08] that 



N N 



<pevB(X) <min [[4, ]"J 



■ C 

i=l 1 j=l 3 




The lower bound is the van der Waerden bound for permanents of doubly stochastic 
matrices, see [Fa81], [Eg81] and also Chapter 12 of [LW01] and recent [G06a], while 
the upper bound is a corollary of the Mine conjecture proved by Bregman, see 
[Br 73], Chapter 11 of [LW01], and also [So03]. 
Now we define 

(2.3.1) p{X ) = ^- P evB(X) 

and 

We summarize results of [Ba08] regarding p and <p. 

(2.4) Theorem. The following hold: 

(1) is log-concave, that is, 

<P(aX + f3Y) > a (X)/(y) 

for all X, Y e A and a, j3 > swc/i a + (3 = 1; 

(2) Let A 6e positive m x n matrices, X = (xy) and F = (j/y), swc/i 
£/ta£ Xij, j/y > 5 /or a// z,j and some 5 > 0. T/ien 

I In 0(X) - In 0(Y") I < ^ max I x tj - y i3 I ; 

ij 

(3) For 5 < 1/mn let us define the 5 -interior As of the simplex A as the set of 
matrices X e A, X = (xij), such that Xij > 5 for all i,j. Then for f = p4> 
we have 

(1 - mn8) N+mn - 1 f f dX < [ f dX < I f dX; 

J A iAj J A 



(4) We have 



1 < p(X) < 



N N 



M 



mm 



m , n , 

11/ II 

i=\ 1 j = l 3 



The log-concavity of function <p was first observed in [G06b] . In terms of [G06b] , 
up to a normalization factor, <p{X) is the capacity of the matrix A(X) of Theorem 
2.1, see also [B07b] for a more general family of inequalities satisfied by 0. As is 
discussed in [Ba08], the matrix scaling algorithm of [L+00] leads to a polynomial 
time algorithm for computing <j>(X). Namely, for any given e > the value of 4>{X) 
can be computed within relative error of e in time polynomial in N and ln(l/e) in 
the unit cost model; our own experience is that this algorithm for computing <p{X) 
is practical, and works well for m, n < 100. 

Theorems 2.4 and 2.1 allow us to apply algorithms of [AK91], [F+94], [FK99], 
and [LV06] on efficient integration and sampling of log-concave functions. First, for 
any given e > 1, one can compute the integral 



within relative error e in time polynomial in e _1 and N by a randomized algorithm. 
Second, one can sample points Xi, . . . ,Xk £ A independently from a measure v 
such that 



where v is the measure with the density proportional to 0, in time polynomial in 

k, e _1 and N. 

The integration of p(X) raises a greater challenge. For any given e > one can 
compute p(X) itself within relative error e in time polynomial in e _1 and N, using 
the permanent approximation algorithm of [J+04]. However, the upper bound of 
Part (4) of Theorem 2.4 is, in the worst case, of order jV~7(™+™) for some absolute 
constant 7 > 0. Therefore, a priori, to integrate p over A using a sample mean, 
one needs too many such computations to guarantee the desired accuracy of e. Our 
main observation to overcome this problem is that in many interesting cases the 
matrices X e A with large values of p(X) do not contribute much to the integral 
(1.1.1), so we have p(X) = jV°( lnAr ) with high probability with respect to the 
density on A proportional to /. 

(2.5) Bounding p with high probability. Let us consider the projection 




i/(5) - v(S) I < e for any Borel set 5 C A, 




A 



mXn? 



pr(X) =X, where 




9 



Clearly, the scalings of X and X to the matrix with the row sums R and column 
sums C coincide. Also, it is clear that the doubly stochastic scalings B(X) and 
B(X), of matrices A(X) and A(X), respectively, also coincide. We define p(X) for 
an arbitrary positive mxn matrix X by p(X) := p(X), or, equivalently, by (2.3.1). 
We introduce the following density ip = ip^c on R™ n by 

* ffl ^|,#""'' where 

X = (xij) and Xij > for all 

and the sum is over all m x n non-negative integer matrices D with the row sums 
R and column sums C. We define ip(X) = if X is not a positive matrix. That ip 
is a probability density is immediate from Theorem 2.1. 

Our goal is to show that for smooth margins (R, C), the value of p(X) is "rea- 
sonably small" for most X, that is, 

(2.5.1) P{I6M™: p(X) > N slnN } < k (2" m + 2" n ) 

for some constants 5 > and k > 0, where the probability is measured with respect 
to the density ip. 

Our construction of function / in (1.1.1) implies that the push- forward of ifj 
under the projection pr : R™ n — > A is the density 

' -f(X) for X E A 



#(R,cy 

on the simplex. Hence inequality (2.5.1) implies that for r = N slnN we have 



^— J f(X)dX< K (2- m + 2- n ) 



#(i?,C) 

p(X)>r 

Therefore, as discussed in Section 1.1, replacing p by its truncation p introduces 
an O (2~ n + 2~ m ) relative error in (1.1.3) and hence our algorithm achieves quasi- 
polynomial complexity. 

The key idea behind inequality (2.5.1) is that the permanent of an appropriately 
defined "random" doubly stochastic matrix is very close with high probability to 
the van der Waerden lower bound N\/N N ; see Lemma 5.1. 

3. Main results 



Now we are ready to precisely define the classes of smooth margins for which 
our algorithm achieves N ( lnN ^ complexity. 
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(3.1) Smoothness Definitions. Fix margins R = (ri, . . . , r m ), C = (ci, . . . , c n ), 
where 

m n 

*=i i=i 

Let 

AT 

s = 



mn 

be the average value of the entries of the table. We define 

r_|_ = max r^, r_ = min 

i=l,... ,m i=l,... ,m 

c+ = max Cj, c_ = min Cj. 

j=l,...,n j=l,...,n 

Hence r + and c+ are the largest row and column sums respectively and r_ and c_ 
are the smallest row and column sums respectively. 

For so > 0, call the margins (R,C) so-moderate if s < sq. In other words, 
margins are moderate if the average entry of the table is bounded from above. 

For a > 1, the margins (R, C) are upper a- smooth if 

N , N 
r + < asn = a — and c + < asm = a — . 

m n 

Thus, margins are upper smooth if the row and column sums are at most propor- 
tional to the average row and column sums respectively. 
For < j3 < 1, the margins (R, C) are lower (3-smooth if 

N N 
r_ > j3sn = (3 — and c_ > j3sm = j3 — . 

m n 

Therefore, margins are lower smooth if the row and column sums are at least 
proportional to the average row and column sums respectively. 

The key smoothness condition is as follows: for a > 1 we define margins (R, C) 
to be strongly upper a-smooth if for the typical table X* = (x*A we have 

x*j < as for all 

Note that this latter condition implies that the margins are upper a-smooth. (Also, 
we do not need a notion of strongly lower (3 smooth.) 

Our main results are randomized approximation algorithms of quasi-polynomial 
jyO(inTV) complexity when the margins (R, C) are smooth for either: 

• so- m °derate strongly upper a-smooth, for some fixed sq and a; 

or 

• lower (3 and strongly upper a-smooth, for some fixed a and (3. 

By the discussion of Section 2.5, the quasi-polynomial complexity claim about 
our algorithm follows from bounding on p(X) with high probability. Specifically, 
we have the following two results. Their proofs are argued similarly, but the second 
is more technically involved. 
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(3.2) Theorem. Fix s > and a > 1. Suppose that m < 2 n , n < 2 m and 

(.R, C) 6e So-moderate strongly upper a- smooth margins. Let X = (xij) be a random 
mxn matrix with density ip of Section 2. 5 , and let p : M.™ n — > R + be the function 
defined in Section 2.3. Then for some constant 5 = 5(a, sq) > and some absolute 
constant k > 0, we have 

P {X : p(X) > N slnN } < K(2- m + 2~ n ) . 

Therefore, the algorithm of Section 1.1 achieves jV°( lnAr ) complexity on these 
classes of margins. 

(3.3) Theorem. Fix a>l 7 0</3<l 7 and p > 1. Suppose that m < pn, n < pm 
and let (R, C) be lower (3 and strongly upper a-smooth margins. Let X = (x^) be 
a random mxn matrix with density ip of Section 2. 5 and let p : R™ n — > R + be 
the function defined in Section 2.3. Then for some constant 5 = 5(p, a, (3) > and 
some absolute constant k > 0, we have 

P {X : p(X) >N slnN } < k (2~ m + 2~ n ) . 

Therefore, the algorithm of Section 1.1 achieves N ^ lnN ^ complexity on these 
classes of margins. 

We remark that in Theorem 3.2 and Theorem 3.3 above, we can replace base 2 
by any base M > 1, fixed in advance. 

(3.4) Example: symmetric margins. While conditions for r+, c + , r_, and c_ 

are straightforward to verify, to check the upper bounds for x*j one may have to 
solve the optimization problem (1.2.1) first. There are, however, some interesting 
cases where an upper bound on x\, can be inferred from symmetry considerations. 

Note that if two row sums and ri 2 are equal then the transportation polytope 
V(R, C) is invariant under the transformation which swaps the ii-st and Z2-nd rows 
of a matrix X e V(R, C). Since the function g in the optimization problem (1.2.1) 
also remains invariant if the rows are swapped and is strictly concave, we must have 
x* ± j = x* 2 j for all j. Similarly, if Cj 1 = Cj 2 we must have x*^ = x*j 2 for all i. In 
particular, if all row sums are equal, we must have x\^ = Cj/m. Similarly, if all 
column sums are equal, we must have x\, = ri/n. 

More generally, one can show (see the proof of Theorem 3.5 in Section 6) that 
the largest entry x*j of X* necessarily lies at the intersection of the row with the 
largest row sum r + and the column with the largest column sum c + . Therefore, if 
k of the row sums are equal to r + we must have x*j < c + /k. Similarly, if k of 
the column sums are equal to c + , we must have x*j < r + /k. 

Here are some examples of classes margins where our algorithm provably achieves 

an 

N 0(\nN) comp i exity . 

• The class of margins for which at least a constant fraction of the row sums ri 
are equal to r + : 

#{z : n = r + } = fi(m) 
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while m, n, the row, and the column sums differ by a factor, fixed in advance: 
m/n = 0(1), n/m = 0(1), r + /r_ = 0(1), c + /c_ = O(l). Indeed, in this case we 
have 



maxx*- = 0(c + /m) = 0(N/mn) 

ij 



and quasi-polynomiality follows by Theorem 3.3. 

• The class of margins for which at least a constant fraction of the row sums ri 
are equal to r + , while the column sums exceed the number of rows by at most a 
factor, fixed in advance, c + = O(m), and m and n are not too disparate: m < 2 n 
and n < 2 m . Indeed, in this case 

max a;* - = 0(c+/m) = O(l) 

ij 

and quasi-polynomiality follows by Theorem 3.2. 

• The classes of margins defined as above, but with rows and columns swapped. 

For a different source of examples, we prove that if both ratios r_|_/r_ and c_|_/c_ 
are not too large, the margins are strongly upper smooth. To do this, we use the 
following general result about the typical table X* , to be proved in Section 6: 

(3.5) Theorem. 

Let X* = (x*j) be the typical table. 

(1) We have 

> and x*a > for all 

J r + m J c + n 

(2) If r-C+ + r-C- + mr- > r+c+ then 

* ^ c + (r~c- + mr + ) . . 
%ii < — 7 — r for all i,3. 

J m (r_c+ + r_c_ + mr_ — r + c+j 

Similarly, if C-r + + c-T- + nc- > r + c + then 

* ^ r + (c-r- + nc+) . . 
< — r for all 

J n{c-r+ + C-T- + nc- — c+r+) 

(3.6) Example: golden ratio margins. Fix 

1 < < « 1-618 

and a number p > 1. Consider the class of margins (R,C) such that m < pn, 
n < pm, and 

r+/r~, c+/c- < /3. 
13 



We claim that our algorithm has an N°( lnN ^ complexity on this class of margins. 
To see this, let 

Pi = r + /r- and P2 = c + /c-. 

If Pi < Pi then 

r_c + + r_c_ - r + c + = (1 + p 2 - P1P2) r-C- > (l + p 2 - P 2 ) r_c_ > er_c_ 
for some e = e(P) > and hence by Part (2) of Theorem 3.5 we have 

Similarly, if p 2 < Pi then 

c_r + + c_r_c + r+ = (1 + Pi — PiP 2 ) r-C- > (l + Pi — Pf) r-C- > er-C- 
for some e = e{P) > and hence 

In either case, (R, C) are strongly upper a-smooth for some a = ot(P) and Theorem 
3.3 implies that our algorithm has a quasi-polynomial complexity on such margins. 
More generally, the algorithm is quasi-polynomial on the class of margins for which 
Pi = r+/ r_ and p 2 = c+/ c_ are bounded above by a constant fixed in advance and 
PiP 2 < max{/3i, p 2 } + 1 — e where e > is fixed in advance. 

(3.7) Example: linear margins. Fix P > 1 and e > such that e/3 < 1 and 
consider the class of margins (R, C) for which 

r+/r_ < P and c+ < em. 

Part (2) of Theorem 3.5 implies that the margins (R, C) are strongly upper a- 
smooth for some a = a(P, e) and therefore quasi-polynomiality of the algorithm is 
guaranteed by Theorem 3.2. 

The remainder of this paper is devoted to the proofs of Theorems 3.2, 3.3, and 
3.5. While the proof of Theorem 3.5 is relatively straightforward, our proofs of 
Theorem 3.2 and especially Theorem 3.3 require some preparation. A general plan 
of the proofs of Theorems 3.2 and 3.3 is given in Section 5. 

4. Asymptotic estimates 

The following result proved in [B07b] provides an asymptotic estimate for the 
number #(R, C) of contingency tables. It explains the role played by the optimiza- 
tion problem (1.2.1). It will also introduces ingredients needed in the statement 
and proof of Theorem 5.3 given below. 

14 



(4.1) Theorem. Let V(R,C) be the transportation polytope of non-negative ma- 
trices with row sums R and column sums C and let X* = (x*j) be the typical table, 
that is, the matrix X* e V(R, C) maximizing 

g(X) = y~]((xjj + 1) ln(xy + 1) - Xijlnxij^j 

onV(R,C). Let 

p(R,C) = exp{g(X*)}= max ]J 

xev(R,c) %J 

Then 

p(R,C) > #(i?,C) > N-y( m +^ P (R,c), 

where 7 > is an absolute constant. 
Another representation of p(R, C) is 

tw-^.J&r') (tiviA (11 , 

0< V i,... ) »„<l \*=i / V = i / \ *J \ 

A point x\,. . . ,x m ;yi, ... ,y n minimizing the above product exists and is unique 
up to scaling Xi 1 — ► XiT, yj 1 — > Vj r ~ X ■ It is related to X* by 

Xij = — for all i, j. 



1 - Xijjj 



We need the notion of the weighted enumeration of tables, as introduced in 
[Ba08] and [B07a]. 

(4.2) Weighted enumeration of tables. Fix margins R and C and a non- 
negative m x n matrix W. Define 



T(R,C;W)= II 

D=(dij) ij 



w dij 

w i 3 ■ 



where the sum is taken over all m x n non-negative integer matrices D with the 
row sums R and column sums C and we agree that wfj = 1. Therefore, 

#(R,C) = T(R,C;1), 

where 1 is the matrix of all l's. 

The estimates of Theorem 4.1 extend to weighted enumeration. We state only 
the part we are going to use. The following result is proved in [B07b] . 
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(4.3) Theorem. Let 



p(R,C;W)= inf 

X i , . . . , X m J> U 

yi ,...,y n >0 V=l 
WijXiyj<l for all i,j 

T/ien 

> T(i2,C;W0 > A^- 7(m+n) p(^, C; 

where 7 > is an absolute constant. 

In fact, we will only use the upper bound of Theorem 4.3, which is actually 
straightforward to prove since Ylij (1 — Wij^-iVj) 1 is the generating function for 
the family T(R,C;W). 

5. The plan of the proofs of Theorems 3.2 and 3.3 

To prove Theorems 3.2 and 3.3 we need to understand the behavior of the func- 
tion 

N N 

p(X) = ^-perS(X), 

that is, to estimate values of permanents of doubly stochastic matrices. The follow- 
ing straightforward corollary of results of [Fa81], [Eg81], [Br73], and [So03] shows 
that the permanent of an iV x N doubly stochastic matrix lies close to N\/N N 
provided the entries of the matrix are not too large. We recall the definition of the 
Gamma function 

r+00 

F(t) = / x f - l e- x dx for t > 0. 
Jo 

(5.1) Lemma. Let B = (pij) be an N x N doubly stochastic matrix and let 

Zi = max ba for i = 1, . . . , N. 

3 = 1,..., N 13 



Suppose that 



Then 



N 



Zj < r for some r > 1. 



i=l 



Nl /t\ n ( N 

< perS < (4-) T r 1 + — 



N N "= ^ "= \NJ 



T 



We delay the proof of Lemma 5.1 until Section 7. 
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We will apply Lemma 5.1 when r = O(lnAT), in which case the ratio between 
the upper and lower bounds becomes N°^ nN \ In addition, we apply the lemma 
to the matrix B(X), the doubly stochastic scaling of the random matrix A(X) 
constructed in Theorem 2.1, see also Section 2.3. However, to use this lemma, we 
need to bound the entries of B(X). To do that, we will need to be able to bound the 
entries of the matrix Y obtained from scaling X to have row sums R and column 
sums C. To this end, we prove the following result in Section 8, which might be of 
independent interest. 

(5.2) Theorem. Let R = (n, . . . , r m ) and C = (ci, . . . , c n ) be positive vectors 
such that 

m n 

2=1 j=l 

Let X = (xij) be an m x n positive matrix and let Y = (yij) be the scaling of X to 
have row sums R and column sums C, where 

Vij = KVjXij for all i,j 

and some positive Ai, . . . , A m ; (Mi, . . . , [in- 
Then, for every 1 < p < m and 1 < q < n we have 

hij/pg < In^^+lnxpq 



N 

\ ij 
^ n ^ m 

— — Cj In X p j — — Ti In Xi q . 
j=l i=l 



Now suppose that (R, C) are upper a-smooth margins, that is, ri/N < a/m and 
Cj/N < ct/n for some a > 1, fixed in advance. To give an idea of the remainder of 
the argument and the role of the hypotheses, suppose further that Xij are sampled 
independently at random from the uniform distribution on [0, 1]. Then Theorem 5.2 
and the law of large numbers clearly imply that as m and n grow, with overwhelming 
probability we have 

Vij < K ~j^ x ij for a11 h3 

and some absolute constant k > 1. If we construct the doubly stochastic matrix 
B(X) as in Section 2.3, then with overwhelming probability for the entries bij we 
will have 

hj < — for all ij. 
17 



However, in the situation of our proof, the matrix X = (xy) is actually sampled 
from the distribution with density ip of Section 2.5. Thus to perform a similar 
analysis, we need to show that the entries of a random matrix X are uniformly 
small. For that, we have to assume that the margins (R, C) are strongly upper 
en-smooth (in fact, one can show that merely the condition of upper smoothness is 
not enough). Specifically, in Section 9, we prove the following result: 

(5.3) Theorem. Let 

S C |(z,j) : i = 1,... ,m; j = l,...,n| 

be a set of indices, and let X = (xij) be a random m x n matrix with density 
= TpR,c of Section 2.5. Suppose that the typical table X* = (x*A satisfies 

x*j < A for all i, j 

and some A > 0. 

Then for all t > we have 

P j E ^^*J^^{-2AT2} 4#5jV7(m+B) ' 

where 7 > is the absolute constant of Theorem 4-1- 

In Section 10 we complete the proof of Theorem 3.2. Theorem 3.3 requires some 
more work and its proof is given in Section 12, after some technical estimates in 
Section 11. 



6. Proof of Theorem 3.5 

First, we observe that the typical table X* = (x*A is strictly positive, that is, 
it lies in the interior of the transportation polytope V(R, C). Indeed, suppose that 
x\ x = 0, for example. Choose indices p and q such that x\ q > and x* pl > 0. Then 
necessarily x* q < r p , c q and we can consider a perturbation X(e) G V(R, C) of X* 
defined for sufficiently small e > by 

' x*j + e if i = l and j = 1 

x*j — e if i = p, j = 1 or % = 1, j = q 

x*j + e if i = p and j = q 

k x*j if i^p and j ^ q. 

Since the value of 
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is equal to +00 at Xij = (we consider the right derivative in this case) and finite 
if Xij > 0, we conclude that for a sufficiently small e > 0, the matrix X(e) attains 
a larger value of g(X), which is a contradiction. We conclude that all the entries 
of the typical table X* are strictly positive. 

Since X* lies in the interior of the transportation polytope V{R, C), the Lagrange 
multiplier condition implies that 

' x *. _|_ 1 

(6.1) In [ ^ J = A; + fjLj for all i,j 

and some Ai, . . . , A m and . . . , fi n . It follows that if x*a > x* 2j for some row 
indices i±, i<i and some column index j then A^ < Xi 2 and hence x\^ > x* 2 j for the 
same row indices %\ and 12 and all column indices j. 

We prove Part (1) first. Let us choose a row io with the largest row sum r+. 
Without loss of generality, we assume that %q = 1. Hence 

> x^j for j = 1, . . . , n. 

Therefore, 

xij > — > — for j = 1,... ,n. 

J m m 

Let us compare the entries in the first row and in the i-th row. From (6.1) we have 

(6.2) l n ( ^£±lj _i n /^±l J =Al -A, for j = l,...,n. 



Since 



x* X j = r+ and ^ x h ^ r ~ > 
i=i i=i 



there exists j such that 



> r_ 



We apply (6.2) with that index j. We have 

[xij + l) X*j 



(6.3) Ai - Ai = In 



K- + 1) ^ 



Now, the minimum value of 
(a + 1)6 

— — where a > b > ra and a > cr 

(6+l)a ~ ~ 
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is attained at a = a and b = ra and equal to 

TO + T 



TO + 1 

In our case (6.3), 

* , * c_ r_ to + t r-C-+mr_ 

a = x 1 j, o = x ii , a = — , t= — , and - = . 

J J m r+ ra + 1 r_c_ + mr+ 

Hence 

. . , r_c_ + mr_ 

Xi — Xi > In 



r_c_ + mr_|_ 

Therefore, for every j, 




<i n i_-_i_i n r - <: - + ' OT - 



r_c_ + mr_|_ 



, c_ + m r_c_+mr. 
< in In 



c_ r_c_ + mr + 



Hence 



and 



x*j + 1 ^ r_c_ + r+m 



< for j = 1,... ,n 



x ij 



„ ^ r_c_ 
4 > 



as desired. The second inequality in Part (1) is proved similarly. 

To prove Part (2), we use an approach similar to that for Part (1), as well as 
its inequality. Let io be the row such that ri = r_. Without loss of generality, we 
assume that io = 1 and hence 

x^yxlj for j = l,...,n. 

Thus we have 

X\i < — < — for j = 1, . . . , n. 

J m m 

Next, we compare the entries of the i-th row of X* and the entries of the first row 
using (6.2). 
Since 

n n 

Y x h ^ r + and Yl x h = r - 

J=l 3=1 
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there is j such that 

< XL 



We apply (6.3) with that index j. The maximum value of 
(a +1)6 

— — where a < b < ra and a > a 

(b+l)a ~ ~ 

is attained at a = a, b = ra and is equal to 

ra + r 
ra+ 1" 

In our case of (6.3), 

* , * r + r-C- . ra + r r-C-+mr- + 

a = x u , o = x ii , r = — , a = , and = 

J J r_ r+m ra + 1 r_c_+mr_ 

where the expression for a follows by Part (1). Hence 

, r_c_ + mr+ 
\\- \< In ± 

r-C- + mr- 

and for all j we have 



In 



>ln 




r_c_ + rar+ 
r_c_ + mr_ 



Hence 



, c+ + m , r_c_ + mr + 
> in in 

c_|_ r_c_ + mr_ 



x **7 + 1 ^ (c + + m) (r_c_ + mr_) 

> — r — - for j = 1, . . . , n 



x*j c + (r-C- + mr + ) 

and the proof follows. □ 

7. Proof of Lemma 5.1 

We will use the following bounds for the permanent. 
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(7.1) The van der Waerden bound. Let B = (6^-) be an N x N doubly 
stochastic matrix, that is, 

N N 

hi j = 1 for z = 1, . . . , N and ^ 6^ = 1 for j = 1, . . . , N 

3=1 i=l 

and 

bij>0 for i,j = l,... ,N. 

Then 

per B > 



N N ' 

This is the famous van der Waerden bound proved by Falikman [Fa81] and Ego- 
rychev [Eg81], see also Chapter 12 of [LW01] and [G06a]. 

(7.2) The continuous version of the Bregman-Minc bound. Let B = (6^) 
be an N x N matrix such that 

JV 

J^6y<l for i = l,...,N 

3 = 1 

bij>0 i,j = l,...,N. 



and 

Furthermore, let 
Then 



Zi = max bin > for i = 1, .... N. 

j=l,..., N 



N 

perB < Y[ : \ ; " 



This bound was obtained by Soules [So03]. 

If Zi = 1/ri for integers r^, the bound transforms into 



N ' \\lfn 



per5<n (ri0 



which can be easily deduced from the Mine conjecture proved by Bregman, see 
[Br 73]. 

Now we are ready to prove Lemma 5.1. 

Proof of Lemma 5.1. The lower bound is the van der Waerden bound. 
To prove the upper bound, define 

/(£) = £lnr(i±JP)+in£ fo r < £ < 1. 

22 



Then / is a concave function and by the Bregman-Minc bound, we have 

N 

i=i 

The function 

JV 

F(x) = Y,ffo) fOT * =(£ 1; ..., Cat) 
i=i 

is concave on the simplex defined by the equation £1 + . . . + £n = t and inequalities 
£i > for % = 1, . . . , N. It is also symmetric under permutations of £i, . . . , £n- 
Hence the maximum of F is attained at 

£i = ... = 6v = t/JV, 

and so 

lnperB< iV/(^). 

Thus 

and the rest follows by Stirling's formula. □ 

8. Proof of Theorem 5.2 

We begin our proof by restating a theorem of Bregman [Br73] in a slightly more 
general form. 

(8.1) Theorem. Let Y = (yij) be the positive m x n matrix that is the scaling of 
a positive m x n matrix X = (x{j) to have margins (R, C). Then 

^2 yij (lnyy - Inxij) < (In 2;^ - lnx^) 

ij ij 

for every matrix Z e V(R,C), where V(R,C) is the transportation polytope of 
m x n non-negative matrices with row sums R and column sums C . 

Proof. The function 

f( z ) = /2 z ij ( lnz ij -Inxy) 

ij 

is strictly convex on V(R, C) and hence attains its unique minimum Y' = {y[j) 
on V(R, C). As in the proof of Theorem 3.5 (see Section 6), we can show that Y' 
is strictly positive, that is, Y' lies in the relative interior of V(R,C). Writing the 
Lagrange multiplier conditions, we obtain 

lnyy - In Xij = + Vj 
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for some £1, . . . , £ m and . . . , Letting Aj = e& and \i 3 - = e Vi we obtain 

y'ij = XiUjXij for all 

so in fact F' = F as desired. □ 

Next, we prove a lemma that extends a result of Linial, Samorodnitsky, and 
Wigderson [L+00]. 

(8.2) Lemma. Let R = (n, . . . , r m ) and C = (ci, . . . , c n ) 6e positive vectors such 
that 

m n 

i=i j=i 

Let X = (xy) be an m x n positive matrix such that 

ij 

and let Y = (yij) be the scaling of X to have row sums R and column sums C. 
Then 

^riCjlnyij > ^ ncj lnxy . 

Proof. Since F is the limit of the sequence of matrices obtained from X by repeated 
alternate scaling of the rows to have row sums r\ , . . . , r m and of the columns to 
have column sums ci, . . . , c n , cf., for example, Chapter 6 of [BR97], it suffices to 
show that when the rows (columns) are scaled, the corresponding weighted sums 
of the logarithms of the entries of the matrix can only increase. 

To this end, let X = (xij) be a positive m x n matrix with the row sums 
01,... , a m such that 



i=l 



and let Y = (yij) be the matrix obtained from Y by scaling the rows to have sums 
ri, . . . ,r m . Hence, 

yij = nxij/ai for all z, j. 



Thus 



^ rjCj (In y^- - lnxy) = y^c 3 I ^ (r^lnr, - r^lncx;) J > 0, 

ij j=l \i=l 



since the maximum of the function 



ln& 

i=i 
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on the simplex 



^T£ i = N and & > for i = 1, . . . , 



m 



. i=l 



is attained at £j = r^. 

The scaling of columns is treated similarly. 

Proof of Theorem 5.2. Without loss of generality, we assume that p = q = 1. 
Define an m x n matrix U = (uij) by 



□ 



(8.3) 



u 



r. /-i . rf . . 



for T 



N ^ 



r, /-a . /y , , 



We note that the scalings of U and X to margins (R,C) coincide and that 

ij 



By Theorem 8.1, the matrix Y minimizes 



13 



over the set V(R, C) of m x n non-negative matrices Z with row sums R and the 
column sums C. 

i(t)) by 



For a real t, let us define the matrix Y(t) = {y %3 

Vij + 1 



Vij N 1 C1 



Vij 



N-n 



if i = j = l 
if z ^ l,j = 1 



y*3+ (JV-rO( 3 AT- Cl ) f if^l^Vl- 

Then Y(0) = Y and Y(t) e C) for all t sufficiently close to 0. Therefore, 

d 



s /(n*))i t .„=o, 



where 



f( z ) = ^2 z ij ( lnz ij ~ In My) • 



«7 
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Therefore, 

In — In tin + 1 



N-d z 



- VVi (lny a - In Ma + 1) 



7 

0. 



Rearranging the summands, 

N 2 



(lnyn - lnwn) 



(iv-n)(iv-ci) 

- (iY-n)(iV-ci) g C ' (ln ^ " ln 
N 



3- 

rn 



(iV-ri)(iV^)g ri(ln ^- ln ^ ) 



= 0. 

On the other hand, by Lemma 8.2, 

^TiCj Qnytj - Inuij) > 0, 

so we must have 

n m 

iV 2 (lnyn - lnttn) - N^cj (lnyn - lnuy) - iV^V; (lnya - ln^i) < 0. 

3=1 i=i 

In other words, 

lnyn < In tin + — J^Cj (lnyn - lnttn) + — (lny a - lnit»i) . 

Since 

3=1 
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we have 

n n 

cf. the proof of Lemma 8.2. Similarly, since 

m 

5^ y*i = ci, 

i=l 

we have 

m m 

^r.ln^i <^r,hi(^). 

i=i i=i 

Substituting (8.3) for Z7, we obtain 

lnyn < lnx n + In (r lCl ) - InT + - £ c, In — + - £ r 4 In — , 

j=l i=l 

and the proof follows. □ 

9. Proof of Theorem 5.3 

Fix margins (R, C), let ifj = i/jr ; c be the density of Section 2.5, and let X = (xij) 
be the random matrix distributed in accordance with the density ip. We will need 
a lemma that connects linear functionals of X with the weighted sums T{R, C; W) 
of Section 4.2. 

(9.1) Lemma. Let \j < 1 be real numbers. 

(1) Let W = (wij) be the m x n matrix of weights given by 

Wij = (1 - Aij) -1 for all ij. 

Then 

(2) We have 

F TT t" a ^ = 1 V TT r ( di i Z Xi J + ^ 

^ ii x j #(r,c) ^ 11 r(di,- + i) ' 

*3 D=(d tJ ) ij V 13 ' 

where the sum is taken over all m x n non-negative integer matrices D = 
(dij) with row sums R and column sums C . 
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Proof. Let us prove Part (1). We have 

E exp £ XijXij = ¥ ^c ) J Rmn exp j - J] (1 - A,,) 



di 



X i 

da I 



— / GXp \ — \ 



e n^n^« 



D=(dij) ij J «j 



11^ 



#(«,C) ;j 



IJ7 



as desired. 
Since 



^ik^^ e n ' - 



the proof of Part (2) follows. □ 

To prove Theorem 5.3 we need only Part (1) of the lemma, while Part (2) will 
be used later in the proof of Theorem 3.3. 

Proof of Theorem 5.3. We use the Laplace transform method, see, for example, 
Appendix A of [AS92]. We have 



£ X « " 4 =P 1 6X5 j 2AT2 ^ X A 
{(i,j)es J [ [ (i,j)es ) 



> 



exp 



t 



2A + 2 



by the Markov inequality. 
By Part (1) of Lemma 9.1, 



, 1 | T(R,C;W) f2\ ■ 2^ 



2A + 2 /ri lJ [ #(i2,C) V2A + 1 
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where 

(2A + 2)/(2A+l) ]£(i,j)eS 
1 if (i,j)tS. 



Wij = 



Clearly, 



2A + 2" 



) < 2* s . 



2\+l) 

To bound the ratio of T(R, C; W) and C), we use Theorems 4.1 and 4.3. 
Let < xi, . . . , x m ; y±, . . . , y n < 1 be numbers such that 



p(r,c)= (n^" ri 



For the typical table X* = we have 

1 - XiJ/j 



x ij = i — — ^ ^ or au 



Therefore, 

x*- A 

a^y,- = — < for all i, j 

3 l + x*j ~ A + l 

and 

WijXiyj < 1 for all 

Then we have 



m \ I n 



and 



p(R,C;W)< [l[x-^\ m% 

u=l / \j=l 



p(R,C;W) < ^ l-a^- 



TT 1 ~ ^jgj _ TT 

Now 



p(-R, C) 11 1 — WijXiyi 11 1 + (1 — 



' <^^<2 for all (i,j)eS 



1 + (1 " A + l 
and hence 

p(fl,C;WQ #s 

Since 

T(R,C;W) < p(R,C;W) and #(#,<?) > p(R, C)N-~t {m+n \ 

the proof follows. □ 
We will need the following corollary. 
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(9.2) Corollary. Suppose that m > n and that the typical table X* = (x*A 
satisfies 

x*j < A for all ij 

and some A > 0. Let X = (x^) be a random mxn matrix distributed in accordance 
with the density ifjR t c, ^nd let 

Ui — — I II (IX Xij. 
3=1,... ,n 

Then for some r = r(A) > we have 

P jX^* > (A + l)rmhiivj < 4" m 

Proof. We apply Theorem 5.3 to each of the n m of subsets S having exactly one 
entry in each row. □ 

We will also use an unconditional bound on the sum of all the entries of X. 

(9.3) Lemma. We have 



P jx> y >2(tf + mn)J < (|) 



N+mn 



Proof. As in the proof of Theorem 5.3, we have 

P > 2(N + mn) j =P |exp | ^ x ij j ^ ex P i N + mn \ 



< exp{-(iV + mn)}E exp 1 7^ X Xi i | 



by Markov's inequality. By Lemma 9.1, 



lr. T(R 7 C;W) T-r 

E exp \ 2 ^ XlJ J = 1 J- WlJ 

Wij = 2 for all i,j 



and the proof follows. 
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□ 



10. Proof of Theorem 3.2 



We start with a technical result. 

(10.1) Lemma. Let (R,C) be upper a-smooth margins, so ri/N < a/m and 
C j/N < a/n for all Let X = (x^) be a random m x n matrix with density 
^R,c °f Section 2.5. Then for any real r 



and 



Proof. Let us prove the first inequality. As in the proof of Theorem 5.3, we use the 
Laplace transform method. We have 



Enr 
cjlnxij > — 




n 

f nr "I 1 n \ ~\ 



Xij 



3 = 1 



x- Xj where A, = 



2ai 11"* 3 2aN' 

3 = 1 



Since 



1 

A, < -, 



by Part (2) of Lemma 9.1 we deduce that 



E K^( r G)) n - 2n 



(we observe that every term in the sum of Lemma 9.1 does not exceed r n (l/2)). 
The proof of the second inequality is identical. □ 

Proof of Theorem 3.2. Without loss of generality, we assume that m > n. We 
recall that function p(X) is computed as follows. Given a positive m x n matrix 
X = (x^), we compute the scaling Y = (yij) of X to have row sums R and the 
column sums C. Then we compute the N x N block matrix B(X) consisting of ran 
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blocks of sizes x Cj with the entries in the (i,j)-th block equal to yij/riCj. Thus 
B{X) is a doubly stochastic matrix and 

N N 

p(X) = —perB(X), 

cf. Section 2. 

We are going to use Theorem 5.2 to bound the entries of Y. 
By Lemma 9.3, 



P jx>y <2(tf + mn)J>l-(|) 



N-\-mn 



Since N < somn, Ti/N < a/m, and Cj/N < a/n we conclude that for some 
k>i = K>i(a, sq) = 2a 2 (sq + 1) we have 



g x N+mn 



4 / 

From Lemma 10.1, for a sufficiently large «2 = ^2(^)5 we have 



{ m 
i=i 



x pi > — K 2 > > 1 — 4 n for all p = 1, . . . , m and 



x iq > -k 2 J > 1 - 4 m for 5 = 1,... , n. 
Therefore, by Theorem 5.2, we have for some k = n(a, s ) 

N+nm 



P [y Pq < ^j^KXpq for all p, g} > 1 - 



- m4 n - n4~ 



Now, £? consists of mn blocks, the (p, g)-th block filled by the entries y pq /r p c q . 
Therefore the probability that for all i, j = 1, . . . N we have 

At 

(10.2) bij < j^ x pq provided lies in the (p, (/)-th block of £? 

is at least 

(,\ N+nm 
- J - m4" n - n4" m . 

We now bound per B(X) using Lemma 5.1 and Corollary 9.2. 
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Let 



Zi = max bin for i = 1, ... N and let 

j=l,...,N J 

tlj) — — 1 1 lcl X •^"nn- 
q=l,...,m 



Then, from (10.2) we have 



N m m 

EK yr — ■> OLK \ 

z ^n1^ t p u p^— 1^ u v 

i=l p=l p=l 



By Corollary 9.2, for some t\ = T±(a, so), we have 

P j^Wm < TimlnA^j > 1 - 4 _m . 
Thus for some r = r(a, so) we have 

( N \ /„\ iV+mn 

p { J2 z * ^ rlnN r ^ 1 - ( i ) - m4 " n - n4 " m - 4 ~ ? " 



and the proof follows by Lemma 5.1. □ 

The rest of the paper deals with the proof of Theorem 3.3. This requires sharp- 
ening of the estimates of Lemma 10.1. Roughly, we need to prove that with over- 
whelming probability 

1 n 

— Cj In Xij > —t + In s and 

i=i 

^ m 

— ^2 ri m - ~ t + m s 

for some constant r = r(a,/3), where s = N/mn is the average entry of the table. 

11. An estimate of a sum over tables 

To sharpen the estimates of Lemma 10.1 we need a more careful estimate of the 
sum in Part (2) of Lemma 9.1. In this section, we prove the following technical 
result. 
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(11.1) Proposition. Suppose that (R, C) are lower (3-smooth and upper a-smooth 
margins and that 

s = N/mn > 1. 

Let Ai, . . . , A m < 1/2 be numbers and let I = Ai + . . . + A m . Then, for k < n we 
have 

1 V" TT F(djj - Aj + 1) rfcm A/-7(m+n) -fcl 

#(i?,c) ^ 11 r(d y + i) - 

7rv ' y D=(d ii ) l<i<m 1 !J ; 

l<j<fc 

where the sum is taken over all non-negative integer matrices D with row sums R 
and column sums C , 5 = S(a, (3) > and 7 is the absolute constant of Theorem 
4.1. 

We start with computing a simplified version of this sum in a closed form. 

(11.2) Definition. Let us fix positive integers c and to. The integer simplex 
T(m, c) is the set of all non-negative integer vectors a = (di, . . . ,d m ) such that 
d\ + . . . + d m = c. 

Clearly, 

/ m + c — r 



#T(m,c) = 



TO — 1 



A sum over T(to, c) similar to that of Proposition 11.1 can be computed in a 
closed form. 

(11.3) Lemma. Let Aj < 1, i = 1, . . . , m, be numbers and let I = Ai + . . . + A m . 
1 >p jt- T (dj - Aj + 1) _ r(c + TO-/)r(m) -rj rn > x 

#t(to, c ) ^ J-A r(di + i) ' r(c + m)r(m-n AI 1 Al) - 

" v ' y di,...,d m >0 i=l v 1 y v / i=1 

d 1 +...+d m =c 

Proof. Let us define a function /i c on the positive orthant M.™ by the formula 

^) = (i + c-li! ex p(-f>| for x = (e 1 ,...,e ra ) G R^. 



(to + c- 1) 

Since 



a=l / di,...,d m >0 
di + ...+d m =c 



We can rewrite 



v 7 di,...,d m >0 i=l 

di + ...+d m =c 
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Therefore, 



di + ...+d m =c 

Let Q C be the simplex Ci + • • • + Cm = 1 with the Lebesgue measure dx 
normalized to the probability measure. Since the function 

£«. II 

i=i / i=i 

is positive homogeneous of degree c — Z, we can write 

(11.3.1) / h c ( X ) nc A< = r(c ^~° / ftcWi]C A ' dx 

% i=l L ^ i=l 

On the other hand, 

(11.3.2) h c (x) = -£i™L-ho(x) for x e Q. 

T(c + m) 

Using (11.3.1) with c = 0, we deduce that 

C m -p/ \ (■ m 

/ h (x)H^ dx = W ^jr / h (x)H^ dx 
JQ ~[ L{m-l) J R ™ ^ 

Now, from (11.3.1) and (11.3.2), we have 

4^n^*- g:-r8 nrci-Ao. 



as desired. 

We need an estimate. 



□ 
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(11.4) Corollary. Suppose that \ < 1/2 for i = 1, . . . ,m and c > (3m for some 
(3>0. Then 



1 



rn 



r (dj - + 
r (di + 1) 



#T(m,c) 



e n 



< 




di,... ,d m >0 i=l 
dl+...+d m =c 



for some constant 5 = 8{j3) > 0, where I = Ai + 
Proof. The proof follows from Lemma 11.3. 



+ A, 



□ 



Fix margins R = (ri, . . . , r m ) and C = (ci, . . . , c n ) and a number k < n. Pick, 
uniformly at random, a contingency table D = (dij) with margins (R,C) and 
consider its submatrix Z consisting of the first k columns. Hence Z is an m x k 
non-negative integer matrix with the column sums ci, . . . , Cfc. We interpret Z as a 
point in the product 



of integer simplices. This process induces a certain distribution on the set T of 
non-negative integer m x k matrices with the column sums ci, . . . , Ck- We want to 
compare this distribution with the uniform distribution. Lemma 11.5 below says 
that the probability to get any particular matrix Z e T cannot exceed the uniform 
probability by much if the margins (R, C) are smooth. 

Once we fix the m x k submatrix Z consisting of the first k columns of a table with 
margins (R, C), the complementary mx(n — k) table has row sums R' = R — R{Z), 
where R(Z) is the vector of row sums of Z, and column sums C = (c^+i, . . . , c n ), 
the truncation of C. Hence the probability of obtaining a particular Z e T is 



where the ratio is declared to be if R' is not non-negative. 
We prove the following estimate. 

(11.5) Lemma. Consider margins (R,C) satisfying the constraints of Proposi- 
tion 11.1. Fix k < n and let T be the set of all mx k non-negative integer matrices 
with the column sums c\ , . . . , cu ■ 

Let C = (cfc+i, . . . , c n ), choose Z e T and set R' = R — R(Z), where R(Z) is 
the vector of the row sums of Z . Then 



T = T (m, ci) x ••■ x T (m, Ck) 



#(i?,C)' 



C) ^km N1 (m+n) 



for some constant 5 = 
Theorem 4-1- 



(a, j3) > 0, where 7 > is an absolute constant from 
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Proof. Let p(R, C) be the quantity of Theorem 4.1. Here we agree that p{R! ', C) = 
if i?' has negative components and that "max" and "min" are replaced by "sup" 
and "inf" respectively if R' is non-negative but has components. 

Let < x±, . . . , x m < 1 and < yi, . . . , y n < 1 be an optimal point in Theorem 
4.1, so 

m n 1 

1L1K II , ,.„• 



i=l j = l l<i<m 

l<j<n 



Then 



m n 

p( fl ',c)<n^ : ii »r ii y^— 

i=l j=k+l 1^;^™ iy 3 



l<i<m 
k+l<j<n 



< 



m n 

II II II , 



i=l j=l 



l<i<m ~ XiV i 
k+l<j<n 



and hence 



p(R', C) 



< [{ {\-XiVj). 



p(R,c) 

Now, by Part (1) of Theorem 3.5, the typical table X* = (x*j) satisfies 



_i<m 
l<j<k 



X ij — 



1 - XiVj 

and for some 5i = 5i(ot,f3). This implies that 



> Sis for all 



1 ~ XiVj = TTxJ, ^ TToVs fora11 iJ - 



Summarizing, 



Now, 



p{R',C) 

p{R,C) ~ \l + 5 lS 



< 



km 



#r = l[h + m - 1 \< 



k 

3 = 1 



Cj + mV 1 / Cj + m 



Cj + m 



m 



m 
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We have 

c 



C J + 171 \ <e m 



Furthermore, since Cj < asm, we have 

<(l + as ) m 

m J 
and 

km 



km ( ' ' " .s \ 



Since by Theorem 4.1 we have 

#(i2,C) > N-^ m+n 1p(R,C) and #(#', C) < p(R',C), 

the proof follows. □ 

Proof of Proposition 11.1. Let T(m,Cj) be the integer simplex of non- negative 
integer vectors summing up to Cj and let 

T = Y(m, ci) x • • • x T(m, c^). 

Using Lemma 11.5, we bound 



£ n 



r(c!y - A,; + 1) 



l<j<fc 

v #{R-R{Z), C) n r( % - Aj + 1) 

^ #(i?,c) -1-1 r(z i7 - + i) 

zer i<j'<fc 

flfcrnj^m+n) rfoj - Aj + 1) 

#x 2^ 11 r(z i7 - + i) 

ZeT l<i<A: 

for some 5\ = S(a, (3). The sum 

1 T-r r(^ij — Aj + 1) 

#T 2^ 11 y(zh + 1) 

^ Z=(z^) l<i<m V JJ ^ 

zer l<j<k 
is just the product of k sums of the type 



1 ^ ^ r(di-Ai + l) ^ /mV 

2^ 11 r^, + -n s U.J 



di+...+d m =Cj 

by Corollary 11.4, for some 5 2 = 8{a,(3). The proof now follows. □ 
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12. Proof of Theorem 3.3 

Fix margins (R, C) and let X = (xy) be the m x n random matrix with density 
tp = TpRC of Section 2.5. Define random variables 



1 n 

hi = — Cj liiXij for i = 1, . . . , m and 

3 = 1 
^ m 

^' = ~n ri ln for ^ = !' • • • ' n - 



i=i 



(12.1) Lemma. Let (R,C) be lower (3-smooth upper a-smooth margins such that 
s = N/mn > 1. 

Choose a subset J C {1, . . . , n} of indices, j^J = k. Then for all t > we have 

P jlg^ <-t + l ns | <exp|-^}^iV^+-), 

Similarly, for a subset I C {1, . . . , m} of indices, #1 = k, we have 

tkn 



f 1 , , 1 (tkn 

j-|><-t + ln S j<expj- — 



jfcnjy7(m+n) 



for some number 5 = 5 (a, (3) > and the absolute constant 7 > of Theorem 4-1- 

Proof. Without loss of generality, it suffices to prove only the first bound and only 
in the case of J = {1, ... , k}. 

We use the Laplace transform method. We have 



1 i m tkm km Ins 

v, < -t + Ins } =P < -— > v, > — 

I 1 2a ^ 3 ~ 2a 2a 

i=i 

k 



_ I I m \ ! _krn_ f tkm 1 
< a - exp|- — |.Eexp^--^i 



Let 



Ai = — -rr < - for z = 1, . . . , m and 
2aN ~ 2 

m 

/ — Ai + . . . + A ra — — . 

2a 
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Using Part (2) of Lemma 9.1, we write 

J m I 1 \^ tt r(djj - \i + i) 

l<j<k 

where the sum is taken over all contingency tables D with margins (i?, C). 

The proof now follows by Proposition 11.1. □ 

We will use the following corollary. 

(12.2) Corollary. Let (R, C) be lower (3- smooth upper a -smooth margins such that 
s = N/mn > 1. Suppose further that m < pn and n < pm for some p > l.Then 
for some t = r(a, (3, p) > we have 

P {#{« : hi < -r + lns} > lnivj < 4~ n and 
P {#{.?': vj < -t + lns} > lnA^} < 4" m . 

Proof. We introduce random sets 

I = \i : hi < —t + Ins} and J = {j : Vj < —t + Ins} 
and note that 

-77 ^ hi < -t + Ins and — !— Vj < -t + In s. 

The proof now follows from Lemma 12.1. □ 

Proof of Theorem 3.3. The proof is a modification of that of Theorem 3.2. We 
recall that 

N N 

p(X) = —perB(X), 

where B(X) is the N x N doubly stochastic matrix constructed as follows: we 
scale m x n matrix X to the matrix Y with row sums R and column sums C and 
let bij = y P q/r p c q provided the entry lies in the (p, q)-th block B(X) of size 
r p x c q . We are going to bound the entries of Y . First, without loss of generality 
we assume that s = N/mn > 1 since the case of s < 1 is treated in Theorem 3.2. 
As in the proof of Theorem 3.2 we conclude that 



I N 2 ^ 

\ 13 



( 12 - 3 ) P {^2Y^c J x lJ <2a\s + l))>l- 
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g\ N+mn 

4 



Let 

1 N 

hp = — cj In x p j for p = 1, . . . , m and 



^ = ^ 5^ rilna; ^ for 9 =!»••• 

Choose r > as in Corollary 12.2. Set 

P = [p : hp < — r + lns} and Q = {q : v q < — r + lns}. 
Thus the probability that 

#P<lniV and #Q < IniV 

is at least 

1 _ 4~ m _ 4~ n . 
If p £ P and q £ Q and (12.3) holds then by Theorem 5.2, 

^p<z — 1 ~^]y rX PQ 
for some 5i(a, /?) > 0. If p G P or g e Q then 

2/ P9 < min{r p ,cj. 
Consequently, for bij with (i, j) in the p, g-th block we have 

hj < -r^Xvq if V £ P and q£Q 

and 

6y < min < — , — > if p G P or q £ Q. 

As in the proof of Theorem 3.2, we let 

Zi = max ha for % = 1, ... N and let 

j=l,..., N 
q=l,...,m 

We estimate that 

1 

Zi< — 
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r P 



if i lies in the p-th row block with p e P and we estimate that 



^ ^1 , 2/ P <j 

< ~\t U P + max 



if z lies in the row block p P. Hence 



N 



By Corollary 9.2, 



lib lib 

i=l p=l p=l y 



P \ > T\sm In AT I < 4" 



for some ti = Ti(ct), and hence 



P^Ew^^lniV[>l-4— 



for some 5 2 = 5 2 (a). Finally, 

m m 

ymax^<yy^<5 3 #Q 

^?eQ c g ~ ^-t^ c q ~ ^ 
for some 5 3 = ^(a,/?). Summarizing, 



JV|mn 

P <! > ' Zi < SlnN \ > 1 - I - ) _4-"_2-4" m 



for some 5 = S(a, (3, p) > and the proof is completed as in Theorem 3.2. □ 
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